Connectivity and expression in protein networks: 
Proteins in a complex are uniformly expressed 
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We explore the interplay between the protein-protein interactions network and the expression of 
the interacting proteins. It is shown that interacting proteins are expressed in significantly more 
similar cellular concentrations. This is largely due to interacting pairs which are part of protein 
complexes. We solve a generic model of complex formation and show explicitly that complexes form 
most efficiently when their members have roughly the same concentrations. Therefore, the observed 
similarity in interacting protein concentrations could be attributed to optimization for efficiency of 
complex formation. 
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I. INTRODUCTION 

Statistical analysis of real- world networks topology has 
attracted much interest in recent years, proving to supply 
new insights and ideas to many diverse fields. In partic- 
ular, the protein-protein interaction network, combining 
many different interactions of proteins within a cell, has 
been the subject of many studies (for a recent review see 
01 ). While this network shares many of the universal 
features of natural networks such as the scale-free distri- 
bution of degrees , and the small world characteristics 
0, it also has some unique features. One of the most 
important of these is arguably the fact that the protein 
interactions underlying this network can be separated 
into two roughly disjoint classes. One of them relates 
to transmission of information within the cell: protein 
A interacts with protein B and changes it, by a confor- 
mational or chemical transformation. The usual scenario 
after such an interaction is that the two proteins disasso- 
ciate shortly after the completion of the transformation. 
On the other hand, many protein interactions are aimed 
at the formation of a protein complex. In this mode of 
operation the physical attachment of two or more pro- 
teins is needed in order to allow for the biological activ- 
ity of the combined complete and is typically stable over 
relatively long time scales [J. 

The yeast Saccharomyces cerevisiae serves as the 
model organism for most of the analyses of protein- 
protein interaction network. The complete set of genes 
and proteins with extensive data on gene expression are 
available Q for this unicellular organism, accompanied 
by large datasets of protein-protein interactions based on 
a wide ran ge of exp erimental and computational methods 
II II II Sin El 111 III El • In addition, the intracel- 
lular locations and the expression levels of most proteins 
of the yeast were recently reported E3- The availability 
of such data enables us to study the relationship between 
network topology and the expression levels of each pro- 
tein. 

In this work we demonstrate the importance of the dis- 
tinction between different types of protein interaction. 



by highlighting one property which is unique to inter- 
actions of the protein complexes. Combining databases 
of yeast protein interactions with the recently reported 
information on the protein concentration, we find that 
proteins belonging to the same complex tend to have a 
more uniform concentration distribution. We further ex- 
plain this finding by a model of complex formation, show- 
ing that uneven concentrations of the complex members 
result in inefficient complex formation. Surprisingly, in 
some cases increasing the concentration of one of the 
complex ingredients decreases the absolute number of 
complexes formed. Thus, the experimental observation 
of uniform complex members concentrations can be ex- 
plained in terms of selection for efficiency. 



II. CONCENTRATIONS OF INTERACTING 
PROTEINS 

We start by studying the concentrations of pairs of in- 
teracting proteins, and demonstrate that different types 
of protein-protein interactions differ in their properties. 
For this purpose we use the recently published database 
providing the (average) concentration |l5j |. as well as 
the localization within the cell, for most of the Sac- 
charomyces cerevisiae (baker's yeast) proteins E3|- The 
concentrations Ci (given in arbitrary units) are approxi- 
mately distributed according to a log-normal distribution 
with {log{ci)) = 7.89 and standard deviation 1.53 (Fig. 

m 

The bakers' yeast serves as a model organism for most 
of the protein-protein interaction network studies. Thus 
a set of many of its protein-protein interactions is also 
readily available. Here we use a dataset of recorded 
yeast protein interactions, given with various levels of 
confidence E3- The dataset lists about 80000 interac- 
tions between approximately 5300 of the yeast proteins 
(or about 12000 interactions between 2600 proteins when 
excluding interactions of the lowest confidence). These 
interactions were deduced by many different experimen- 
tal methods, and describe different biological relations 
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FIG. 1: (Color online) Distribution of the logarithm of the 
protein concentration (in units of protein molecules per cell) 
for all measured proteins within the yeast cell. 

between the proteins involved. The protein interaction 
network exhibits a high level of clustering (clustering co- 



efficient w 0.39). This is partly due to the existence of 
many sets of proteins forming complexes, where each of 
the complex members interacts with many other mem- 
bers. 

Combining these two databases, we study the correla- 
tion between the (logarithm of) concentrations of pairs 
of interacting proteins. In order to gain insight into the 
different components of the network, we perform this cal- 
culation separately for the interactions deduced by dif- 
ferent experimental methods. For simplicity, we report 
here the results after excluding the interactions anno- 
tated as low-confidence (many of which are expected to 
be false-positives). We have explicitly checked that their 
inclusion does not change the results qualitatively. The 
results are summarized in Tabled and show a significant 
correlation between the expression levels of interacting 
proteins. 
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TABLE I: (Color online) Correlation coefficients between 
the logarithm of the concentrations of interacting proteins. 
Only interactions of medium or high confidence were included. 
The statistical significance of the results was estimated by 
randomly permuting the concentrations of the proteins and 
reevaluating the correlation on the same underlying network, 
repeated for 1,000 different permutations. The mean corre- 
lation of the randomly permuted networks was zero, and the 
standard deviation (STD) is given. The P-value was calcu- 
lated assuming gaussian distribution of the correlation val- 
ues for the randomized networks. We have verified that the 
distributions of the 1,000 realizations calculated are roughly 
Guassian. 

The strongest correlation is seen for the subset of pro- 
tein interactions which were derived from synexpression, 
i.e. inferred from correlated mRNA expression. This re- 
sult confirms the common expectation that genes with 
correlated mRNA expression would yield correlated pro- 
tein levels as wellH. However, our results show that 
interacting protein pairs whose interaction was deduced 



by other methods exhibit significant positive correlation 
as well. The effect is weak for the yeast 2-Hybrid (Y2H) 
method^Ol which includes all possible physical interac- 
tions between the proteins (and is also known to suffer 
from many artifacts and false-positives) , but stronger for 
the HMS (High-throughput Mass Spectrometry) M and 
TAP (Tandem- Affinity Purification) llJ interactions cor- 



3 



responding to actual physical interactions (i.e., experi- 
mental evidence that the proteins actually bind together 
in- vivo) . These experimental methods are specifically de- 
signed to detect cellular protein complexes. The above 
results thus hint that the overall correlation between con- 
centrations of interacting proteins is due to the tendency 
of proteins which are part of a stable complex to have 
similar concentrations. 

The same picture emerges when one counts the num- 
ber of interactions a protein has with other proteins of 
similar concentration, compared to the number of inter- 
actions with randomly chosen proteins. A protein inter- 
acts, on average, with 0.49% of the proteins with similar 
expression level (i.e., | log-difference] < 1), as opposed 
to only 0.36 ± 0.01 % of random proteins, in agreement 
with the above observation of complex members having 
similar protein concentrations. 

In order to directly test this hypothesis (i.e. that pro- 
teins in a complex have similar concentrations), we use 
existing datasets of protein complexes and study the uni- 
formity of concentrations of members of each complex. 
The complexes data were taken from fT^ , and were found 
to have many TAP interactions within them. As a mea- 
sure of the uniformity of the expression levels within each 
complex, wc calculate the variance of the (logarithm of 
the) concentrations among the members of each complex. 
The average variance (over all complexes) is found to be 
2.35, compared to 2.88±0.07 and 2.74±0.11 for random- 
ized complexes in two different randomization schemes 
(see figure), confirming that the concentrations of com- 
plex members tend to be more uniform than a random 
set of proteins. 




FIG. 2; (Color online) (a) Variance of the logarithm of the 
protein expression levels (in units of mulecules per cell) for 
members of real complexes, averaged over all complexes, 
comapred with the averaged variance of the complexes af- 
ter randomization of their members, letting each protein par- 
ticipate on average in the same number of complexes (ran- 
dom(l)), as well as randomized complexes where the number 
of complexes each protein participates in is kept fixed (ran- 
dom(2)). Real complexes have a lower variance, indicating 
higher uniformity in the expression levels of the underlying 
proteins, (b) Same as (a) for expression levels in pentagons 
(see text). 

As another test, we study a different yeast protein in- 



teraction network, the one from the DIP database |l8j . 
We look for fully-connected sub-graphs of size 5, which 
are expected to represent complexes, sub-complexes or 
groups of proteins working together. The network con- 
tains approximately 1600 (highly overlapping) such pen- 
tagons, made of about 300 different proteins. The vari- 
ance of the logarithm of the concentrations of each pen- 
tagon members, averaged over the different pentagons, is 
1.234. As before, this is a significantly low variance com- 
pared with random sets of five proteins (average variance 
1.847 ± 0.02 and 1.718 ± 0.21), see figurcEl 

Finally, we have used mRNA expression data and 
looked for correlated expression patterns within com- 
plexes. We have calculated the correlation coefficient be- 
tween the expression data of the two proteins for each 
pair of proteins which are part of the same pentagon. 
The average correlation coefficient between proteins be- 
longing to the same fully-connected pentagon is 0.15 com- 
pared to 0.056 ± 0.005 for a random pair. 

In summary, combination of a number of yeast pro- 
tein interaction networks with protein and mRNA ex- 
pression data yields the conclusion that interacting pro- 
teins tend to have similar concentrations. The effect is 
stronger when focusing on interactions which represent 
stable physical interactions, i.e. complex formation, sug- 
gesting that the overall effect is largely due to the uni- 
formity in the concentrations of proteins belonging to 
the same complex. In the next Section we explain this 
finding by a model of complex formation. We show, on 
general grounds, that complex formation is more effective 
when the concentrations of its constituents is roughly the 
same. Thus, the observation made in the present Section 
can be explained by selection for efficiency of complex 
formation. 



III. MODEL 

Here we study a model of complex formation, and ex- 
plore the effectiveness of complex production as a func- 
tion of the relative abundances of its constituents. For 
simplicity, we start by a detailed analysis of the three- 
components complex production, which already captures 
most of the important effects. 

Denote the concentrations of the three components of 
the complex by A, B and C, and the concentrations of the 
complexes they form by AB, AC, BC and ABC. The lat- 
ter is the concentration of the full complex, which is the 
desired outcome of the production, while the first three 
describe the different sub-complexes which are formed 
(in this case, each of which is composed of two compo- 
nents). Three-body processes, i.e., direct generation (or 
decomposition) of ABC out of A B and C, can usually 
be neglected [l9j, but their inclusion here does not com- 
plicate the analysis. The resulting set of reaction kinetic 
equations is given by 
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d{A) 
dt 

d{B) 
dt 

d{C) 
dt 

d{AB) 

dt 
d{AC) 

dt 
d{BC) 

dt 

d{ABC) 
dt 



~^a,A,BA ■ B — ka^ qA ■ C — ka^^BcA ' BC — kaji g qA ■ B ■ C (1) 

kdA.sAB + kds c^C + {kdB,Ac + kdA.B.c) ■ ABC 

^^aA.sA ■ B — kag qB ■ C — kag AcB ' AC — kaA^B.cA ■ B ■ C (2) 

^dA.cAC + kdg c-BC + {kdc_AB + ^dA,B,c) ' ABC 

^^aA,cA • C — kag fjB ■ C — kac AB^ ' AB — kaA^B.cA ■ B ■ C (3) 

kaA,BA ■ B + kdc.ABABC - kdA.BAB - ka^AB^ ■ AB (4) 

kaA.cA ■ C + kdg AC ABC — kdA c AC — kag ac^ ■ AC (5) 

kag^cB ■ C + kdA^BcABC — kdg qBC — kaA^gcA ■ BC (6) 

^lA.BcA • BC + kag AC'B ■ AC + kaQ AB^ ' AB + kaA^B,cA ■ B ■ C 

~{kdA,BC + kdB,AC + kdc,AB + ^dA.B.c) ' ABC (7) 



where fca^; ^ {kd^^y) are the association (dissociation) rates 
of the subcomponents a; and y to form the complex xy. 
Denoting the total number of type A, B and C particles 
by Aq, i?Oi C'oj respectively, we may write the conserva- 
tion of material equations: 

A + AB^AC + ABC = A^ (8) 
B + BC + AB + ABC = Bq (9) 
A + AC + BC + ABC = Co (10) 

We look for the steady-state solution of these equa- 
tions, where all time derivatives vanish. For simplicity, 
we consider first the totally symmetric situation, where 
all the ratios of association coefficients to their corre- 
sponding dissociation coefficients are equal, i.e., the ra- 
tios kd^ y/ka^ y SLTC all equal to Xq and kd^ y ^/ka^ v z ^ 
Xq, where Xq is a constant with concentrations units. 
In this case, measuring all concentrations in units oi Xq, 
all the reaction equations are solved by the substitutions 
AB = A B, AC = A-C, BC = B-C and ABC = A-B-C, 
and one needs only to solve the material conservation 
equations, which take the form: 

A + A-B + A-C + A-B-C^Aq (11) 
B + B-C + A-B + A-B-C = Bo (12) 
A + A-C + B-C + A-B-C = Co (13) 

These equations allow for an exact and straight-forward 
(albeit cumbersome) analytical solution. In the follow- 
ing, we explore the properties of this solution. The 
efficiency of the production of ABC, the desired com- 
plex, can be measured by the number of formed com- 
plexes relative to the maximal number of complexes pos- 
sible given the initial concentrations of supplied particles 



eff = ABC/ mm {Aq, Bq, Co). This definition does not 
take into account the obvious waste resulting from pro- 
teins of the more abundant species which are bound to be 
leftover due to shortage of proteins of the other species. 
In the following we show that having unmatched concen- 
trations of the different complex components result in 
lower efficiency beyond this obvious waste. 

In the linear regime, Ao,Bo,Co <C 1, the fraction of 
particles forming complexes is small, and all concentra- 
tions are just proportional to the initial concentrations. 
The overall efficiency of the process in this regime is ex- 
tremely low, ABC = A-B-C Aq-Bq-Co Ao,Bo,Co. 
We thus go beyond this trivial linear regime, and focus 
on the region where all concentrations are greater than 
unity. Fig. |31 presents the efficiency as a function of 
Aq and Bq, for fixed Cq = 10^. The efficiency is max- 
imized when the two more abundant components have 
approximately the same concentration, i.e., for ~ Bq 
(if Co < Ao,Bo), for « Co = 10^ (if Bo < Ao,Co) 
and for Bq Ki Co ^ 10^ (if < ^o, Co). 

Moreover, looking at the absolute quantity of the com- 
plex product, one observes (fixing the concentrations of 
two of substances, e.g., Bq and Co) that ABC itself has 
a maximum at some finite Aq, i.e., there is a finite opti- 
mal concentration for A particles (see Fig. Adding 
more molecules of type A beyond the optimal concentra- 
tion decreases the amount of the desired complexes. The 
concentration that maximizes the overall production of 
the three-component complex is Aq. ,max ~ max (i?0i Co). 

An analytical solution is available for a somewhat 
more general situation, allowing the ratios kd^ l^a^ 
to take different values for the two-components as- 
sociation/dissociation {Xq) and the three-components 
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FIG. 3: (Color online) The efficiency of the synthesis eff = 
ABC / min (^o, Bo, Co) as a function of Ao and Bo, for Co = 
10^ . The efficiency is maximized when the two most abundant 
species have roughly the same concentration. 



complex is much more stable than the intermediate AB, 
AC, and BC states. 
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FIG. 5: (Color online) Synthesis efficiency eff = 
ABC/ min {Ao, Bo, Co) as a function of Ao and Bo, for dif- 
ferent values of a. Co is fixed, Co = 100. The efficiency is 
maximized when the two most abundant substances are of 
roughly the same concentration, regardless of the values of a. 



FIG. 4: (Color online) log (ABC) as a function of Ao,Bo, 
for fixed Co — 10^. For each row (fixed Ao) or column 
(fixed Bo) in the graph, ABC has a maximum, which occurs 
where Ao,,nax ~ max(Bo,Co) (for columns), and Bo,max ~ 
max(j4o,Co) (for rows). 

association/dissociation {Xo/a and X^/a for asso- 
ciation/dissociation of the three-component complex 
from/to a two-component complex plus one single parti- 
cle or to three single particles, respectively). It can be 
easily seen that under these conditions, and measuring 
the concentration in units of Xq again, the solution of 
the reaction kinetics equations is given by 

AB = A-B, (14) 

AC = A-C, (15) 

BC = B-C, (16) 

ABC = a A- B-C, (17) 

and therefore the conservation of material equations take 
the form 

A + A-B + A-C + aA-B-C = Ao (18) 
B + B-C + A-B + aA-B-C = Bo (19) 
A + A-C + B-C + aA-B-C^Co (20) 

These equations are also amenable for an analytical so- 
lution, and one finds that taking a not equal to 1 does 
not qualitatively change the above results. In particular, 
the synthesis is most efficient when the two highest con- 
centrations are roughly equal, see Fig. Note that our 
results hold even for a ^ 1, where the three-component 



We have explicitly checked that the same picture holds 
for 4-component complexes as well: fixing the concen- 
trations Bq, Co, and Dq, the concentration of the tar- 
get complex ABCD is again maximized for Ao,max ~ 
max (Bq, Co, Dq). This behavior is expected to hold qual- 
itatively for a general number of components and ar- 
bitrary reaction rates, due to the following argument: 
Assume a complex is to be produced from many con- 
stituents, one of which (A) is far more abundant than 
the others {B, C, ...). Since A is in excess, almost all B 
particles will bound to A and form AB complexes. Sim- 
ilarly, almost all C particles will bound to A to form an 
AC complex. Thus, there will be very few free C par- 
ticles to bound to the AB complexes, and very few free 
B particles available for binding with the AC complexes. 
As a result, one gets relatively many half-done AB and 
AC complexes, but not the desired ABC (note that AB 
and AC cannot bound together). Lowering the concen- 
tration of A particles allows more B and C particles to 
remain in an unbounded state, and thus increases the 
total production rate of ABC complexes (Fig. ISJ. 

Many proteins take part in more than one complex. 
One might thus wonder what is the optimal concentra- 
tion for these, and how it affects the general correlation 
observed between the concentrations of members of the 
same complex. In order to clarify this issue, we have stud- 
ied a model in which four proteins A, B, C and D bind to- 
gether to form two desired products: the ABC and BCD 
complexes. A and D do not interact, so that there are 
no complexes or sub-complexes of the type AD, ABD, 
ACD and ABCD. Solution of this model (see appendix) 
reveals that the efficiency of the production of ABC and 
BCD is maximized when (for a fixed ratio of ^o a-nd Dq) 



6 




FIG. 6: (Color online) The dimensionless concentrations of 
the complex ABC (solid line), partial complex AB (dashed 
line), and C (dotted line) as a function of the total concentra- 
tion of A particles, Aq (C is multiplied by 10 for visibility). 
Bo and Co are fixed Bq = Co = 10'^ . The maximum of ABC 
for finite Ao is a result of the balance between increase in the 
number of AB and AC complexes and the decrease in the 
number of available free B and C particles as Ao increases. 

Aq + Dq Bq Cq. One thus sees, as could have been 
expected, that proteins that are involved in more than 
one complex (like B and C in the above model) will tend 
to have higher concentrations than other members of the 
same complex participating in only one complex. Nev- 
ertheless, since the protein-protein interaction network 
is scale-free, most proteins take part in a small-number 
of complexes, and only a very small fraction participate 
in many complexes. Moreover, given the three orders of 
magnitude spread in protein concentrations (see figure 
0, only proteins participating in a very large number of 
complexes (relative to the avregae participation) or par- 
ticipating in two complexes of a very different concentra- 
tions (i.e., Aq 3> Dq) will result in order-of-magnitude 
deviations from the equal concentration optimum. The 
effects of these relatively few proteins on the average over 
all interacting proteins is small enough not to destroy the 
concentration correlation, as we observed in the experi- 
mental data. 

In summary, the solution of our simplified complex for- 
mation model shows that the rate and efficiency of com- 
plex formation depends strongly, and in a non-obvious 
way, on the relative concentrations of the constituents 
of the complex. The efficiency is maximized when all 
concentrations of the different complex constituents are 
roughly equal. Adding more of the ingredients beyond 
this optimal point not only reduces the efhciency, but 
also results in lower product yield. This unexpected be- 
havior is qualitatively explained by a simple argument, 
and is expected to hold generally. Therefore, effective 
formation of complexes in a network puts constraints on 
the concentrations on the underlying building blocks. Ac- 
cordingly, one can understand the tendency of members 
of cellular protein-complexes to have uniform concentra- 
tions, as presented in the previous Section, as a selection 
towards efficiency. 

APPENDIX: TWO COUPLED COMPLEXES 

We consider a model in which four proteins A, B, C 
and D bind together to form two desired products: the 



ABC and BCD complexes. A and D do not interact, 
so that there are no complexes or sub-complexes of the 
type AD, ABD, ACD and ABCD. For simplicity, we 
assume the totally symmetric situation, where all the ra- 
tios of association coefficients to their corresponding dis- 
sociation coefficients are equal, i.e., the ratios k^^ y/ka^ y 
are all equal to Xq and fc^^ J^a^ ^ = Xq, where Xq is 
a constant with concentrations units. The extension to 
the more general case discussed in the paper is straight 
forward. Using the same scaling as above, the reaction 
equations arc solved by the substitutions AB ~ A ■ B, 
AC = A - C, BC = B ■ C, BD = B ■ D, CD ^ C ■ D, 
ABC ^ A- B -C, and BCD = B-C-D, and one needs 
only to solve the material conservation equations, which 
take the form: 

A +A-B + A-C + A-B-C = Aq (A.l) 
B +A-B + B- C + B- D + A- B- C + B- C- D = Bq 

(A.2) 

C +A-C + B-C + C-D + A-B-C + B-C-D = Co 

(A.3) 

D +B-D + C-D + B-C-D = Dq (A.4) 

Denoting 7 = D' = -y, Eq HA.4|) becomes 

D' + D' ■ B + D' ■ C + D' ■ B -C = Aq (A.5) 

This is exactly the equation we wrote for A IjA.ip . and 
thus D = J A. Substitutng this into equations HA.2|I and 
l|A.3(l . one gets 

B + B • C + (7 + 1)A • S + (7 + 1)A • S • C = Bo(A.6) 
C + B • C + (7 + 1)A • C + (7 + 1)A • B • C = Co(A.7) 

We n ow define A' = (7 + !)^, A'q = (7+1)^0 and obtain 
from (|A.1IA.6IA.7|I 

A' + A' -B + A' -C + A' -B-C = A'q (A.8) 
B + A'-B + B-C + A'-B-C = Bq (A.9) 
C + A'-C + B-C + A'-B-C = Co (A.IO) 

These are the very same equations that we wrote for 
the 3-particles case where the desired product was ABC. 
Their solution showed that efficiency is maximized at 
Aq k, Bq Cq. We thus conclude that in the present 
4-component scenario, the efficiency of ABC and BCD 
(for fixed 7) is maximized when {Aq + Dq) ~ Bq ~ Cq. 
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